Demonstration of R Juneja mapping scripts

First things first, load the functions into your environment.

# Load the mapping functions
source("MappingFunctions.R")
ls()
[1] "MapDF"      "MapDF_FST"  "MapVCFLoci"

There should now be three functions in your environment, MapDF, MapDF_FST and MapVCFLoci. Each one uses the same algorithm to map a set of markers that have been mapped to the current Ae. aegypti genome (AaegL1/2/3) and converts them into Juneja (http://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0002652) assembly physically mapped markers where possible. You will also need to make sure the file JunejaGeneticAssembly.csv is in your working directory.

Mapping from a VCF file

If you have a VCF files with markers mapped to the AaegL genome, you just need to provide the path to where the VCF is and it will return a dataframe with Juneja mapped markers. This function requires you have the package VariantAnnotation from Bioconductor installed. It should install this automatically but do let me know if it doesn’t work. Let’s use the VCF file published in Gordana’s Rio paper (http://onlinelibrary.wiley.com/doi/10.1111/eva.12301/full) as an example

BrazilMappedLoci <- MapVCFLoci("Rasic2015Rio.vcf")
Total Loci to be mapped:  13193 
Total assigned to chromosomes: 9247 
Total mapped to physical positions: 8677 
Total assigned to misassembled contigs: 1223 
Total unknown is 3946 
Total assigned to chrom 1: 1955 | chromosome only: 94 
Total assigned to chrom 2: 3989 | chromosome only: 152 
Total assigned to chrom 3: 3303 | chromosome only: 324 
Time taken: 8.560978 seconds.
head(BrazilMappedLoci)
         CHROM     POS    ID  JunSC JunChr     JunBP  JuncM
1 supercont1.1  148364 33405 sc1.1b      1 123508582 24.396
2 supercont1.1  148385 33405 sc1.1b      1 123508603 24.396
3 supercont1.1  148433 33405 sc1.1b      1 123508651 24.396
4 supercont1.1 1186392 33369   <NA>     NA        NA     NA
5 supercont1.1 1186412 33369   <NA>     NA        NA     NA
6 supercont1.1 1186620 33370   <NA>     NA        NA     NA

Too easy.

Mapping from a data frame

The next function is handy if you have a table type output that has the supercontig identifiers and the basepair positions within them. I created an output like this by using the following command with vcftools: vcftools --vcf Rasic2015Rio.vcf --weir-fst-pop TB.txt --weir-fst-pop UR.txt --out TBvsUR

TBvsURFST <- read.delim("TBvsUR.weir.fst")
head(TBvsURFST)
         CHROM     POS WEIR_AND_COCKERHAM_FST
1 supercont1.1  148364                    NaN
2 supercont1.1  148385            -0.02347510
3 supercont1.1  148433                    NaN
4 supercont1.1 1186392            -0.00523313
5 supercont1.1 1186412            -0.02090100
6 supercont1.1 1186620             0.02145190
# I usually rename the last column as it is a bit unwieldy...
colnames(TBvsURFST) <- c("CHROM", "POS", "FST")
TBvsURFST <- MapDF(TBvsURFST)
Total Loci to be mapped:  13193 
Total assigned to chromosomes: 9247 
Total mapped to physical positions: 8677 
Total assigned to misassembled contigs: 1223 
Total unknown is 3946 
Total physically assigned to chrom 1: 1955 | chromosome only: 94 
Total physically assigned to chrom 2: 3989 | chromosome only: 152 
Total physically assigned to chrom 3: 3303 | chromosome only: 324 
Time taken: 19.85283 seconds.
head(TBvsURFST)
         CHROM     POS         FST JunChr     JunBP  JunSC  JuncM
1 supercont1.1  148364         NaN      1 123508582 sc1.1a 24.396
2 supercont1.1  148385 -0.02347510      1 123508603 sc1.1a 24.396
3 supercont1.1  148433         NaN      1 123508651 sc1.1a 24.396
4 supercont1.1 1186392 -0.00523313     NA        NA   <NA>     NA
5 supercont1.1 1186412 -0.02090100     NA        NA   <NA>     NA
6 supercont1.1 1186620  0.02145190     NA        NA   <NA>     NA
# To get just the mapped positions
TBvsURFST <- TBvsURFST[complete.cases(TBvsURFST), ]

Cool so what fun things can we do with the mapped data frame? Well I like making pretty interactive plots with a package called manhattanly.

# if you haven't installed it yet uncomment the next line and run it
# install.packages('manhattanly')
library(manhattanly)
library(plotly)
# Get some pretty colours from RColorBrewer, if you don't have this package
# install.packages('RColorBrewer')
library(RColorBrewer)
colours <- brewer.pal(4, "Dark2")

m <- manhattanr(x = TBvsURFST, chr = "JunChr", bp = "JunBP", p = "FST", logp = FALSE, 
    annotation1 = "JunSC", annotation2 = "JunBP")

manhattanly(m, genomewideline = F, col = c(colours[1], colours[4], colours[3]), 
    ylab = "FST", ylim = c(-0.05, 1), suggestiveline = mean(TBvsURFST$FST, na.rm = T), 
    suggestiveline_color = "yellow", title = "") %>% layout(yaxis = list(range = c(-0.06, 
    1)))

Mapping when you still want to visualise the unassigned contigs

So now you’re probably thinking, ok, that’s pretty and all but we chucked out >5000 markers, what if I want to see them on my visualisation? That’s where the third mapping function comes in. This one creates some false chromosomes so that we can see all of the markers. The function maps to chromosomes 1, 2 and 3 as before, but maps markers mapped to contigs that have been assigned to chromosomes but haven’t had their exact locations mapped to chromosomes 4, 5, and 6, and all other markers are assigned to chromosome 7.

# Let's relaod the dataframe so it doesn't get too messy
TBvsURFST_UN <- read.delim("TBvsUR.weir.fst")
colnames(TBvsURFST_UN) <- c("CHROM", "POS", "FST")
TBvsURFST_UN <- MapDF_FST(TBvsURFST_UN)
Total Loci to be mapped:  13193 
Total assigned: 11332 
Total assigned to misassembled contigs: 1223 
Total unknown is 0 
Total assigned to chrom 1: 1861 
Total assigned to chrom 2: 3837 
Total assigned to chrom 3: 2979 
Time taken: 18.27814 seconds.
head(TBvsURFST_UN)
         CHROM     POS         FST JunChr     JunBP        JunSC  JuncM
1 supercont1.1  148364         NaN      1 123508582       sc1.1a 24.396
2 supercont1.1  148385 -0.02347510      1 123508603       sc1.1a 24.396
3 supercont1.1  148433         NaN      1 123508651       sc1.1a 24.396
4 supercont1.1 1186392 -0.00523313      7   2186392 supercont1.1     NA
5 supercont1.1 1186412 -0.02090100      7   3372804 supercont1.1     NA
6 supercont1.1 1186620  0.02145190      7   4559424 supercont1.1     NA
m <- manhattanr(x = TBvsURFST_UN, chr = "JunChr", bp = "JunBP", p = "FST", logp = FALSE, 
    annotation1 = "JunSC", annotation2 = "JunBP")

manhattanly(m, genomewideline = F, col = c(colours[1], colours[4], colours[3], 
    colours[1], colours[4], colours[3], colours[2]), ylab = "FST", labelChr = c(1:3, 
    "U1", "U2", "U3", "Unassigned"), ylim = c(-0.05, 1), suggestiveline = mean(TBvsURFST_UN$FST, 
    na.rm = T), suggestiveline_color = "yellow", title = "") %>% layout(yaxis = list(range = c(-0.06, 
    1)))

If there are a lot of unassigned it can take over the graph a bit but the good thing about interactivity is that you can zoom in and out on areas you’re interested in. Static versions of these graphs can be created using the package qqman.

Hopefully these functions will make your life easier when trying to investigate physically mapped Ae. aegypti supercontigs on the Juneja 2014 assembly.

2016-10-29